Things to avoid

Allocating memory/creating objects in the Draw/Update events

As much as possible, avoid allocating memory/creating objects with Dim, Initialize, LoadBitmap, CreateScaledBitmap, etc., in the Update and Draw events because that lowers the performance. All objects (bitmaps, arrays, etc.) should be initialized and ready to use when these events are fired.

Modifying views in the Draw event

Never change a view property (text of label, width of imageview, ...) in the Draw event because that lowers a lot the performance. Do these changes in the Update event.

What's the fastest method to...?

Draw a bitmap:

Average FPS for the Perf_Smileys example on a Nexus 7 with JB 4.2.2 (settings 15/500 with hardware acceleration):
DrawBitmapAt: 42
DrawObjectAt with a bitmap object: 41
DrawBitmapWithMatrixAt with Filter=False: 37
DrawBitmap: 34
DrawBitmapWithMatrixAt with Filter=True: 30

Average FPS on a Huawei Honor with Gingerbread 2.3.6 (settings 15/150 without hardware acceleration):
DrawBitmapAt: 41
DrawObjectAt with a bitmap object: 41
DrawBitmapWithMatrixAt with Filter=False: 41
DrawBitmapWithMatrixAt with Filter=True: 40
DrawBitmap: 39

Rotate a bitmap:

I modified the Perf_Smileys example with the following code:
Code 1:
    AC.MatrixSetRotate2(45, 32dip, 32dip)
    AC.DrawBitmapWithMatrixAt(bmpSmiley, Smiley.x, Smiley.y, Filter)

Code 2:
    AC.SaveState
    AC.RotateCanvasAround(45, Smiley.x + 32dip, Smiley.y + 32dip)
    AC.DrawBitmapAt(bmpSmiley, Smiley.x, Smiley.y)
    AC.RestoreState

Average FPS on a Nexus 7 with JB 4.2.2 (settings 15/500 with hardware acceleration):
Code 1 with Filter=False: 33
Code 1 with Filter=True: 30
Code 2: 30

Scale a bitmap:

I modified the Perf_Smileys example with the following code:
Code 1:
    AC.MatrixSetScale(2, 2)
    AC.DrawBitmapWithMatrixAt(bmpSmiley, Smiley.x, Smiley.y, Filter)

Code 2:
    AC.SaveState
    AC.ScaleCanvas(2, 2)
    AC.DrawBitmapAt(bmpSmiley, Smiley.x / 2, Smiley.y / 2)
    AC.RestoreState

Code 3:
    R.Initialize(Smiley.x, Smiley.y, Smiley.x + 128dip, Smiley.y + 128dip)
    AC.DrawBitmap(bmpSmiley, Null, R)

Average FPS on a Nexus 7 with JB 4.2.2 (settings 15/500 with hardware acceleration):
Code 2: 22
Code 1 with Filter=False: 22
Code 1 with Filter=True: 22
Code 3: 21

SetXferMode and Porter-Duff modes

The result of SetXferMode may vary from one OS to another when the hardware acceleration is enabled. That's why it is advised to use this function with the acceleration disabled. However, some modes are fully supported by all OS and should benefit from the hardware acceleration without issue:

ModeWorks in all circumstances
ClearNo
DarkenNo
Dst_ATopNo
Dst_InNo
Dst_OutYes
Dst_OverYes
LightenNo
MultiplyNo
ScreenYes
Src_ATopYes
Src_InNo
Src_OutNo
Src_OverYes
XORYes

The source object (Src) is the object with the XferMode set. The destination object (Dst) is everything that intersects or is overlapped by the source object, including the background of your surface. If you want the destination to be a specific object (or group of objects), you have to draw the Src and Dst objects in a separate layer with CreateLayer, then tranfer the result to the canvas with TransferLayer (cf. example Objects'n'Modes).

Using a bitmap instead of drawing the background

If you draw repeatedly the background of your surface with the same scene, you may save the result as a bitmap (cf. the Bitmap property) and set this bitmap as the background of the surface. That will increase the performance. But beware: you cannot get the Bitmap property in the Draw event and you should set the background outside of it.

Saving memory

Two effective ways of reducing the memory consumption are the use of LoadScaledBitmap to load your bitmaps with their exact size on screen (you should prefer this function to LoadBitmapSample in all cases) and ReduceColors to reduce their color range (the differences for the eye are subtle and should be unnoticed in most cases). ReduceColors removes also the alpha channel, so it is not suited for images with transparent parts.